Dataset pipelines#163
Conversation
|
Latest downloadable build artifacts for this PR commit
Available artifact names
|
| bt datasets pipeline run ./pipeline.ts --limit 100 | ||
|
|
||
| # Staged execution for inspection or agent editing. | ||
| bt datasets pipeline fetch ./pipeline.ts --limit 500 |
There was a problem hiding this comment.
why is this called fetch here, but the enum is called pull? Can we make the naming more consistent?
There was a problem hiding this comment.
good catch, at some point i renamed from fetch to pull and there were some lingering references.
| Ok((ctx, client, project)) | ||
| } | ||
|
|
||
| fn discovery_filter( |
There was a problem hiding this comment.
should we add a timestamp filter of some kind? Just to constrain the queries here?
There was a problem hiding this comment.
good catch
There was a problem hiding this comment.
added a --window argument which defaults to 1d and is always and'd in.
| /// Maximum number of source refs to discover | ||
| #[arg( | ||
| long, | ||
| alias = "target", |
There was a problem hiding this comment.
I would prefer if we didn't have this alias. limit aligns better with the rest of the CLI options, and it might be confusing that this is referring to source refs, and not final row count (while things like --target-dataset refer to the output).
There was a problem hiding this comment.
agree
| ); | ||
| } | ||
|
|
||
| datasets_api::create_dataset_with_metadata( |
There was a problem hiding this comment.
auto creating datasets like this might be surprising behaviour. Means that folks run into issues if they make a spelling mistake or something similar. But I don't feel strongly about this, just figured being explicit about it might be better for the agents.
There was a problem hiding this comment.
i generally agree but our SDKs auto create projects and datasets so i think as-is, this is more consistent with our current semantics.
No description provided.